Comet: Batched Stream Processing in Data Intensive Distributed Computing

نویسندگان

  • Bingsheng He
  • Mao Yang
  • Zhenyu Guo
  • Rishan Chen
  • Bing Su
  • Wei Lin
  • Lidong Zhou
چکیده

Performance and resource optimization is an important research problem in data intensive distributed computing. We present a new batched stream processing model that captures query correlations to expose I/O and computation redundancies for optimizations. The model is inspired by our empirical study on a trace from a production large-scale data processing cluster, which reveals significant redundancies caused by strong temporal and spatial correlations among queries. We have developed Comet, a query processing system that embraces the batched stream processing model for optimizations. We have integrated Comet with DryadLINQ. With its roots in query optimizations for database systems, Comet enables a set of new heuristics and opportunities tailored for distributed computing in DryadLINQ. Optimizations in Comet are effective. The evaluation of a micro-benchmark on a 40-machine cluster shows a 42% reduction in total machine time and over 40% reduction in total I/O. Our simulation on a real trace covering over 19 million machine hours shows an estimated I/O saving of over 50%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Wave Computing in the Cloud

We introduce the new Wave model for exposing the temporal relationship among the queries in data-intensive distributed computing. The model defines the notion of query series to capture the recurrent nature of batched computation on periodically updated input streams. This seemingly simple concept captures a significant portion of the queries we observed in a production system. The recurring na...

متن کامل

Enriching Batched Stream processing using Bayesian Networks for Web services

The need for secure data transactions has become a necessity of our time. Medical records, financial records, legal information and payment gateway are all in need of secure data transaction process. There have been several methods proposed to perform secure, fast and scalable data transactions in web services. As the web servers deals with the huge amount of query it becomes really difficult t...

متن کامل

Elastic Resource Provisioning for Batched Stream Processing System in Container Cloud

Batched stream processing systems achieve higher throughput than traditional stream processing systems while providing low latency guarantee. Recently, batched stream processing systems tend to be deployed in cloud due to their requirement of elasticity and cost efficiency. However, the performance of batched stream processing systems are hardly guaranteed in cloud because static resource provi...

متن کامل

Contract-Based Load Management in Federated Distributed Systems

This paper focuses on load management in looselycoupled federated distributed systems. We present a distributed mechanism for moving load between autonomous participants using bilateral contracts that are negotiated offline and that set bounded prices for moving load. We show that our mechanism has good incentive properties, efficiently redistributes excess load, and has a low overhead in pract...

متن کامل

An Internet-Wide Distributed System for Data-Stream Processing

The ubiquity of the Internet has stimulated the development of datarather than processor-intensive applications. Such data-intensive applications include streaming media, interactive distance learning, and live web-casts. While plenty of research has focused on the real-time delivery of media streams to various subscribers, no solutions have been proposed that provide per-subscriber QoS guarant...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009